Assortative mixing in Protein Contact Networks and protein folding kinetics 
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Starting from linear chains of amino acids, the spontaneous folding of proteins into their elabo- 
rate three-dimensional structures is one of the remarkable examples of biological self-organization. 
We investigated native state structures of 30 single-domain, two-state proteins, from complex net- 
works perspective, to understand the role of topological parameters in proteins' folding kinetics, at 
two length scales - as "Protein Contact Networks (PCNs)" and their corresponding "Long-range 
Interaction Networks (LINs)" constructed by ignoring the short-range interactions. 

Our results show that, both PCNs and LINs exhibit the exceptional topological property of 
"assortative mixing" that is absent in all other biological and technological networks studied so 
far. We show that the degree distribution of these contact networks is partly responsible for the 
observed assortativity. The coefficient of assortativity also shows a positive correlation with the rate 
of protein folding at both short and long contact scale, whereas, the clustering coefficients of only 
the LINs exhibit a negative correlation. The results indicate that the general topological parameters 
of these naturally-evolved protein networks can effectively represent the structural and functional 
properties required for fast information transfer among the residues facilitating biochemical/kinetic 
functions, such as, allostery, stability, and the rate of folding. 

Supplementary Information: Supplementary data are available at Bioinformatics online. 
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I. INTRODUCTION 

Inside the cell, proteins are synthesized as linear chains 
of amino acids, which fold into unique three-dimensional 
structures ('native states'). The wide range of biochem- 
ical functions performed by the proteins are specified 
by their detailed structures. Despite the large degrees 
of freedom, surprisingly, proteins fold into their native 
states in a very short time, which is known as Levinthal's 
Paradox [l|. Although, given suitable conditions, some 
small proteins can reach their native state in a single con- 
certed step, many others fold in stages with initial confor- 
mational events long before the final ('native') structure 
appears [2l|. Structural changes and chemical interac- 
tions occur throughout the entire folding process, and 
strongly cooperative mechanisms are necessary to bring 
the protein in its native conformation within a very short 
time period Q. The fast folding is a result of the cat- 
alytic effect of the formation of clusters of residues in con- 
tact with each other, which have high preferences for the 
early formation of secondary structures (helices, sheets, 
and loops) in the presence of significant amounts of long- 
range tertiary structure interactions [4]. 

The folding mechanism, kinetics, structure and func- 
tion of proteins are intimately related to each other. 
Misfolding of proteins into non-native structures can 
lead to several disorders 0. Correlating sequence with 
structure, as well as understanding of folding kinet- 
ics has been an area of intense activity for experimen- 
talists and theoreticians [y, 0]. Among the different 
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theoretical approaches used for studying protein struc- 
ture, function, and folding kinetics, the graph theoreti- 
cal approach, based on perspectives from complex net- 
works, has been used recently to study protein struc- 
tures I, i, 0, [H E E M MM^- 

It is known that folding mechanisms are largely de- 
termined by a protein's topology rather than its inter- 
atomic interactions U^. With that understanding, we 
build graph-theoretical models of protein structures to 
investigate various topological properties at two differ- 
ent length scales, and study their possible role in the 
kinetics of the protein folding. We use a coarse-grained 
complex network model of a protein structure, viz. the 
Protein Contact Network (PCN), by ignoring the fine- 
grained atomic-level details, and model the three dimen- 
sional structure as a system constituted of amino acid 
units, put in place by noncovalent interactions. Long- 
range interactions are known to play a distinct role in 
determining the tertiary structure of the proteins J19l] . 
as opposed to the short-range interactions, which could 
largely contribute to the secondary structure formations. 
We consider the Long-range Interaction Network (LIN) 
of each protein, which are subsets of the corresponding 
PCNs, constructed by ignoring the short-range interac- 
tions. The idea behind studying LINs is to understand 
the contribution of the long-range interactions to the 
topological properties, and their correlation to a biophys- 
ically relevant property, viz. rate of protein folding. 

This study aims to address the question — Can general 
network parameters, derived from native-state structures 
of proteins, uncover features about the relationship of 
the structural properties to the folding kinetics of the 
proteins? To study this we choose single domain, two- 
state folding proteins that belong to different structural 
classes [201 ^'^^ which the kinetic parameter of rate of fold- 
ing, {kp) is available. Our analysis of the coarse-grained 



network representations of protein structures uncover the 
exceptional topological property of a high degree of as- 
sortative mixing at both length scales (PCN and LIN) in 
these naturally-occurring, evolutionarily selected, biolog- 
ical networks. Assortative mixing in LINs indicates that 
this feature in PCNs is independent of short-range inter- 
actions. The coefficient of assortativity [21i|, a measure 
of assortative mixing, are also found to be considerably 
high for both PCNs and LINs. By constructing appro- 
priate control networks, we further demonstrate that the 
degree (connectivity) distribution of the PCNs alone can 
partially account for the presence of assortativity in these 
networks. 

To enumerate the contribution of these global param- 
eters obtained from the coarse-grained network model 
of protein structures to their biophysical properties, we 
show that the coefficient of assortativity of PCNs and 
LINs tend to have positive correlation with the exper- 
imentally determined rate of folding of these proteins. 
This implies that assortative mixing, that tends to con- 
nect highly-connected residues to other residues with 
many contacts, may assist in speeding up of the folding 
process. In contrast, the average clustering coefficients 
of LINs show a good negative correlation with the rate 
of folding, indicating that clustering of amino acids, that 
participate in long-range interactions, into cliques, slows 
down the folding process. Interestingly, the average clus- 
tering coefficients of PCNs show negligible correlation, 
thereby implying that the short range interactions can 
reduce the negative effect on their folding kinetics. 

Three parameters— CO (Contact Order) ^, LRO 
(Long Range Order) [H, and TCD (Total Contact Dis- 
tance) [2J] — based on sequence distance per contact 
and/or total number of contacts per residue of the pro- 
teins, have also been shown to have neg ative correla- 
tion to their rate of folding [22, S, l24l- The accu- 
racy of prediction of the rate of folding, with parameters 
LRO and TCD, remain unchanged if short-range inter- 
actions are not included in the calculation. Here, along 
with delineating the role of long-range interactions, we 
have attempted to show that general network parame- 
ters, such as, clustering coefficient and assortativity, that 
are widely-used in networks of diverse origins (technolog- 
ical, biological and social), can not only give an insight 
into their structural properties, but can also be used as 
indicators of specific biophysical processes, such as, of 
protein folding. 



be in spatial contact ('link') if there existed a threshold 
distance {Re < 8A) between their Cq, atoms. 

The Long-range Interaction Network (LIN) of a PCN 
was obtained by considering, other than the backbone 
links, only those 'contacts' which occur between amino 
acids that are 'distant' (i.e. separated by 12 or more 
amino acids) from each other along the backbone. Thus 
formed, a LIN is a subset of its PCN with same number 
of nodes (n^) but fewer number of links due to removal 
of the short-range contacts. 

Two types of random controls were created for the 
PCNs of the proteins. The polypeptide backbone connec- 
tivity was kept intact in both the random controls, while 
randomizing the noncovalent contacts. For every protein, 
100 instances of each type of random control were gener- 
ated from its PCN. Average of all the instances were used 
as a representative of the parameters and properties, and 
compared with that of the PCNs and their LINs. 

Type /: This random control network has the same 
number of residues (rij.) and number of links/contacts 
(ric) as those of the PCN, except that the contacts were 
created randomly by avoiding duplicate and self contacts. 

Type IT. Apart from maintaining the number of nodes 
(rij.) and contacts ijic), the connectivity distribution 
of PCNs was also conserved in this control network. 
To ensure adequate randomization, the pattern of pair- 
connectivity was randomized 2000 times. 

The details of methods of construction with illustration 
is given in Supplementary Data. 



Data 

Except for Fig. [I] all studies have been done on 
30 single-domain, two-state folding, globular proteins, 
whose experimental rate of folding (In^kp)) are available. 
The data include 5 all-a, 13 all-/3, and 12 a/3 class of pro- 
teins. The natural logarithms of rate of folding (ln{kF)) 
of these proteins vary between —1.48 and 9.8 and have a 
range for the time of folding [l/kp) of the order of 10^ 
seconds. Sizes [rir) of these proteins range from 43 to 
126 amino acids. The structural data for these studies 
were obtained from the Protein Data Bank [291. The 
preliminary network analysis (shown in Fig. [1]) was done 
on 110 proteins (43 < rir < 2359) from the major struc- 
tural classes, which include the 30 single domain proteins 
mentioned above. 



II. METHODS 



Network parameters 



Construction of PCN, LIN, and their Random 
Controls 



The Protein Contact Network (PCN) was modeled 
from the native-state protein structures as available in 
PDB [2^. The C^ atom of each amino acid was con- 
sidered a 'node', and any two amino acids were said to 



The following parameters were studied for the PCN, 
LIN, and their random controls. 

Shortest Path Length and Characteristic Path 
Length — Shortest path length (Ly) between any pair of 
nodes i & j is the number of links that must be traversed 
between them by the shortest route. The average 
of all shortest path lengths, known as 'characteristic 



path length' (L), is an indicator of compactness of the 
network, and is defined as 1261, 



L = 



ririjlr — 1) 



where n^ is the number of residues in the network. 

Clustering Coefficient — Clustering coefficient is the 
measure of cliquishness of the network. Clustering co- 
efficient of a node i, C-i, is defined [2^ as the Ci = 2* 
n/ki{ki — f), where n denotes the number of contacts 
amongst the ki neighbors of node i. Average clustering 
coefficient of the network (C) is the average of C^s of all 
the nodes in the network and is referred to as 'clustering 
coefficient' unless specified otherwise. 

Degree and Remaining Degree — Degree (fc) is defined 
as the total number of neighbors a node is connected to. 
Degree is one of the measures of 'centrality' of a node in 
the network — the larger the degree more important it is. 
Remaining degree is one less than the total degree of a 
node 21] . Other measures, based on degree, are maxi- 
mum degree, kmax, average degree, (fc), and the average 
degree of nearest neighbors, {knn{k))- 

Assortative Mixing and Coefficient of Assortativity — 
A network is said to show assortative mixing, if the high- 
degree nodes in the network tend to be connected with 
other high-degree nodes, and 'disassortative' when the 
high-degree nodes tend to connect to low-degree nodes. 
The Coefficient of Assortativity (r) measures the ten- 
dency of degree correlation, ft is the Pearson correlation 
coefficient of the degrees at either end of a link and is 
defined [2l| as, 

where r is the coefficient of assortativity, j and k are 
the degrees of nodes, Qj and qk are the remaining degree 
distributions, Cjk is the joint distribution of the remain- 
ing degrees of the two nodes at either end of a randomly 
chosen link, and cTg is the variance of the distribution q^. . 



III. RESULTS 



Clustering coefficients of PCNs and LINs 



30 



=-25 

f 
□1 

o 20 

a 

0- 15 
o 
'^ 
at 

a 10 



(0 

O 



□ PCNs 

LINs 

• (Type I) Random Controls of PCNs 

A LINs of (Type I) Random Controls 




0.1 



0.2 0.3 0.4 0.5 0.6 

Clustering Coefficient (C) 



0.7 



FIG. 1: L-C plot for 110 proteins from different structural 
classes; PCNs (D), LINs (O), Type I Random Controls of 
PCNs (•) and LINs(a). Error-bars in the random controls 
data indicate standard deviations in L and C for each protein 
computed over 100 instances. 



The results indicate two major differences between the 
topological properties of the PCNs and their correspond- 
ing LINs. The PCNs of these proteins have high cluster- 
ing coefficients [CpcN = 0.562±0.029) compared to their 
random controls, whereas the LINs show distribution in 
C over a range {Clin — 0.259 ± 0.109), even though 
their random controls were almost indistinguishable from 
those of PCNs. L and C of random controls of PCNs 
were 2.621 ± 0.411 & 0.0557 ± 0.0476 and that of their 
LINs were 3.256 ± 0.056 & 0.075 ± 0.012. The LINs also 
have a little higher characteristic path lengths {Lljn = 
8.72 ± 4.564) than PCNs [Lpcn = 5.818 ± 2.826) ow- 
ing to their reduced number of contacts as compared to 
those in PCNs. This indicates that the differences in 
Clins may assign specificity to the protein networks at 
this length scale, which is otherwise lost with the short 
range contacts in PCNs, rendering the generic property 
of high clustering and compactness. The role, if any, the 
differential extent of clustering in the protein contact net- 
works at the two length scales may play in their kinetics 
of folding process is shown later. 



PCNs from a large set of proteins have earlier been 
shown [3, [To, [O, [l2| to have high degree of clustering, 
which contributes to their "small-world" 26] nature. To 
study if the PCNs and their corresponding LfNs of pro- 
teins have similar topological properties such as, char- 
acteristic path length {L) and clustering coefficient (C), 
we plotted the L versus C graph in Fig. [T]for 110 pro- 
teins from the four major structural classes (i.e., a, f3, 
a + (3, and a/P). The figure also shows their correspond- 
ing Type I random controls. The Type II random con- 
trols were found to be indistinguishable from the Type f 
controls and not shown in Fig. [1] 



Degree distributions of PCNs and LINs 

The distribution of degrees in a network is an impor- 
tant feature, which reflects the topology of the network, 
and is also a possible indicator of the processes by which 
the network has evolved to attain the present topology. 
The networks in which the links between any two nodes 
are assigned randomly have a Poisson degree distribu- 
tion |27'| with most of the nodes having similar degree. 

Fig. [2] shows the normalized degree distributions of 
PCNs and LINs of the 30 proteins studied. The frequen- 
cies of nodes were scaled with the largest degree (kmax) 
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FIG. 2: Normalized degree distributions P{k) of (a) PCNs and (b) LINs. Shown in the insets are (a) Type I Random Controls 
of PCNs and (b) their LINs. Thick lines are the best-fit curves for the means of the data. Error-bars indicate standard deviation 
of the data for P{k) of nodes with degree k across the 30 proteins analyzed. 
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FIG. 3: Degree correlation pattern for (a) PCNs and (b) LINs. Assortative mixing of PCNs (D) and LINs (O) as compared 
to Type I Random Controls of PCNs (•) and their LINs (a), and Type II Random Controls of PCN (O) and their LINs (a). 
Error-bars indicate standard deviation of the data for (fc„„(fc)) of nodes with degree k across the 30 proteins and their controls. 



in the network (PCN or LIN) to obtain the P{k) of a 
given protein, so that proteins of different sizes can be 
compared. As seen in Fig. [DJa), the PCNs have Gaus- 
sian degree distribution that best fits the equation 



y{x) 



A 



: exp ■ 



-2{x-x^f 



with A = 5.538, w = 6.265, and Xc = 9.373. 

On the other hand, Fig. [IJb) shows that the degree 
distribution of LINs is significantly different than those 
of PCNs. In LINs, most nodes were populated in the 
low-degree region and very few of them have high de- 



grees. The best-fit for the LINs represent a single-scale 
exponential function [lOJ, P(fc) ~ fc~'''exp (— fc/fcc), with 
7 = 0.24 and kc = 4.4. The nodes of degree 1 in LINs' 
degree distributions, are the N- and C-terminal amino 
acids that are at the either end of the protein back- 
bone. As expected [23|, the Type I random controls of 
the PCNs (Fig. [Ha), inset) have a Poisson degree dis- 
tribution. LINs of Type I random controls (Fig. [D(b), 
inset) too have a Poisson degree distribution. The figure 
clearly shows that these properties are the same for pro- 
teins irrespective of their functions and structural classi- 
fications 
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FIG. 4: Histograms of 'Coefficient of Assortativity (r)' of PCNs, Type I and Type II Random Controls of PCNs, and their 
LINs. (a) and (b) PCNs and LINs (■) and their (Type I) Random Controls (D). (c) and (d) PCNs and LINs (■) and their 
(Type II) Random Controls (D). 



Assortative nature of PCNs and LINs 

The pattern of connectivity among the nodes of vary- 
ing degrees can affect the interaction dynamics in the 
network, and their degree correlation is used as a mea- 
sure to compute the strength and pattern of connectivity 
in a network. Average Degree of the nearest neighbors, 
knnik), of nodes of degree k, is a parameter by which one 
can measure and visualize the degree correlation pattern 
on a network. In the presence of correlations, fc„„(fc) 
increases with increasing k for an 'assortative network', 
and decreases with k for a 'disassortative network' [28j . 

Fig. [3] shows {knn{k)) versus k plots for the PCNs (a) 
and LINs (b) and the two types of random controls. The 
nature of the curves for the PCNs (D in Fig. [3)Ja)) and 
their LINs (O in Fig. [3Jb)) shows that both networks are 
characterized with 'assortative mixing', as the average 
degree of the neighboring nodes increased with k. The 
curve shows a tendency to saturate at larger k - a fea- 
ture that may be due to the steric hindrance experienced 
by the connecting amino acids in the three-dimensional 
structural organization of the protein. This steric hin- 
drance restricts the position of an amino acid in the three 
dimensional conformational space, and results in a max- 
imum values of degree (kmax) of a node. In comparison, 
the {knn{k)) remained almost constant for the Type I 
random control for both PCNs (•) and LINs (a), indi- 
cating lack of correlations among the nodes' connectivity 



2l|, r, is a global 



in these controls. 

The 'coefficient of assortativity' 
quantitative measure of degree correlations in a network, 
and takes values as— l<r<l. ris zero for no corre- 
lations among nodes' connectivity, and takes positive or 
negative values for assortative or disassortative mixing, 
respectively. The r for both PCNs and LINs of the 30 
proteins were found to be positive, indicating that the 
networks are assortative. Fig. [3] shows the histograms of 
r of (a) PCNs, (b) LINs, both in (■), and their Type 
I random controls (D). The r values of both PCNs as 
well as LINs of all the proteins show significantly high 
positive values (range: 0.09 < r < 0.52 for PCNs, and 
0.12 < r < 0.58 for LINs) when compared to other net- 
works of diverse origins [2l|. Thus, the networks mod- 
elling the native protein structures are clearly character- 
ized by high degree of assortative mixing at both short 
and long contact scales. The Type I random controls in 
Fig. H (a & b), for both PCNs and their LINs, are dis- 
tributed around zero, confirming the observation of lack 
of degree correlations of the controls, made in Fig. [31 

These properties of positive r and assortative degree 
correlations were also observed (data not shown) for a 
large number of protein structures performing various 
cellular functions and belonging to diverse structural cat- 
egories (used in !12|). This conclusively proves that the 
assortative mixing in PCNs and LINs is a generic feature 
of protein structures. The role, if any, the assortative na- 



ture of the protein contact networks at both length scales 
may play in their kinetics of folding process is shown 
later. 



Degree distribution partially accounts for 
assortativity 

To investigate whether the patterns of connectivity in 
the PCNs and LINs of the three dimensional structures of 
the proteins contribute towards the observed assortativ- 
ity, we studied the assortative mixing and the 'coefficient 
of assortativity' of Type II random controls, in which 
the degree distribution of the PCNs were preserved while 
randomizing the pair-connectivities. Fig.[3]Jc,d) show the 
degree correlation plots of the Type II Random Controls 
of PCN (O) and their LINs (a). It is clear that, un- 
like Type I random controls, the average degree of the 
neighboring nodes increased with k in Type II Random 
Controls, as seen for the PCNs and LINs. 

The histograms of the 'coefficient of assortativity' (r) 
of Type II Random Controls (D) are shown in Figs. [3] 
(c & d). Here also, it can be seen that the assortativ- 
ity is partially recovered in the Type II random controls 
for both PCNs and their LINs. Thus degree distribution 
partially explains the observed assortative mixing. It im- 
plies that preserving the degree distribution of PCN, even 
while randomizing the pair-connectivities, is important 
to partially restore the assortative mixing in the random 
controls of PCNs as well as their LINs. The recovery of 
assortative mixing in the LINs by Type II random con- 
trols of PCNs is even more surprising, as the degree dis- 
tribution of LINs (Fig. ^b)) is very different compared 
to the PCNs (Fig. [2Ka)). Thi^ is especially significant in 
the light of the observation |29| that one can rewire the 
links in a (scale-free) network to obtain assortativity or 
disassortativity, to any degree, without any change in the 
degree distribution. 



Correlation of protein network parameters to 
protein folding rates 

The general network parameters (e.g., L, C, and r) 
have been used to shed light on the topology, growth 
and dynamics of widely different networks - physical, so- 
cial and biological. Here we show the relationship of these 
general topological parameters (specifically, C and r) ob- 
tained from our coarse-grained model of protein struc- 
tures (the PCNs and LINs), to a biophysical property un- 
derlying the organization of the three-dimensional struc- 
ture of the protein chains, i.e., with the kinetics of protein 
folding. Below we have correlated the available experi- 
mental data on the rate of folding of the 30 proteins with 
the two network parameters, C and r of the PCNs and 
their LINs. 
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FIG. 5: Rate of folding, Inikp), has a negative correlation, 
as indicated by the trendline, with Clustering Coefficient (C) 
LINs. 



Average Clustering Coefficient and Rate of Folding 

Fig. [1] shows that the PCNs and their LINs differ in 
their clustering coefficients (C), with PCNs having simi- 
lar but high C, and their LINs having C distributed over 
a range from low to medium values. We did not find 
any significant relationship between the clustering coeffi- 
cient of the PCNs (Cpcn) and the ln{kF) for all the 30 
proteins (correlation coefficient = -0.2437; p < 0.2). On 
the other hand, ln{kp) showed a high negative correla- 
tion with the average clustering coefficient of the corre- 
sponding LINs [Clim)- Since the clustering coefficient 
depends on the degree of the node, we plot, in Figure [5l 
the Clin *kmax with ln{kF) of all the proteins. The plot 
shows significantly high negative correlation (correlation 
coefficient = -0.7712; p < 0.0001) between the Clins 
and the rate of folding for these single-domain, two-state 
folding proteins. Fig [5] also shows that neither Type I 
nor Type II Random Controls show any correlation with 
the rate of folding of the corresponding LINs. 

Clin enumerates number of loops of length three in 
the Long-range Interaction Network. Thus Clin essen- 
tially correlates to the number of 'distant' amino acids 
(nodes), those separated by a minimum of 12 or more 
other amino acids along the backbone, brought in mu- 
tual 'contact' with each other in the native state struc- 
ture of the protein. Understandably, more the number 
of such long-range mutual contacts are required to be 
made in order to achieve the native state, more is the 
time taken to fold, and hence slower is the rate of fold- 
ing. Interestingly, our result shows that this feature is 
completely neutralised through the short-range contacts 
in the PCNs. It may be mentioned that a comparable 
correlation (—0.7574; p < 0.0001) is observed between 
the Contact Order (CO) of these 30 proteins with their 
ln{kF)- It is interesting to note that despite dissimilar 
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FIG. 6: Positive correlation between the rate of folding, 
ln{kF), and the Coefficient of Assortativity (r) of LINs. The 
trendline is also shown. 



quantities that CO and Clin measure, the similar corre- 
lation coefficients essentially indicate the important role 
of long-range contact formation in the rate of folding. 



Coejjicient of Assortativity and Rate of Folding 



Unlike the clustering coefficients, the protein networks 
show high coefficient of assortativity (r) at both length 
scales (i.e., for the PCNs and their LINs). In Fig. [SI the 
rate of folding of the proteins are plotted as a function 
of the coefficient of assortativity of their LINs. There is 
an increasing trend of ln(fci?) with increase in r. The five 
a proteins, all having high rate of folding, do not follow 
the trend very well. The correlation coeff. between the 
rate of folding {Inikp)) and r of their LINs, excluding 
the five a proteins, is 0.6981 {p < 0.0005). The same 
for the PCNs is calculated to be 0.5943 {p < 0.005). 
The result implies that, along with showing assortative 
mixing, the PCNs and their LINs both show significant 
positive correlations with the rate of folding. Thus, the 
generic property of assortative mixing in proteins tends 
to contribute positively towards their kinetics of folding, 
and is fairly independent of the short- and long-range of 
interactions. Here also the Type I Random Controls, due 
to their coefficient of assortativity being clustered around 
zero (Fig.Hfb)), do not show any correlation with the rate 
of folding. As is expected from Fig. [3] and [4], the Type II 
random controls, on the other hand, are scattered owing 
to the partial gain in assortativity, though they do not 
show any definite trend with the rate of folding. 



IV. DISCUSSION 

In recent years, much interest is seen in the study of 
structure and dynamics of networks, with application to 
systems of diverse origins such as, society, technology, 
and biology etc [30, [Sll- The aim of these studies has 
been to identify the common organizational principles 
within these wide variety of systems, and identify gen- 
eral network parameters that can correlate to the struc- 
ture, function, and evolution of each of the specific pro- 
cesses. Of these, biological networks are of special inter- 
est as they are products of long evolutionary history. The 
protein contact network is exclusive among other intra- 
cellular networks for their unique method of synthesis 
as a linear chain of amino acids, and then folding into 
a stable three-dimensional structure through short- and 
long-range contacts among the residues. In this study, 
our aim is to understand if the general network parame- 
ters can offer any clue to the biophysical properties of the 
existing three dimensional structure of a protein, thereby 
reflecting the commonalities in network organization in 
general. 

Our coarse-grained complex network model of pro- 
tein structures uncovers, for the first time in a naturally 
evolved biological system, the interesting, and excep- 
tional topological feature of assortativity at both short 
and long length scale of contacts. The assortative na- 
ture is found to be a generic feature of protein struc- 
tures. We show that the assortativity positively cor- 
relates to the folding mechanisms at both length scale. 
This feature corroborates the known fact that the fold- 
ing mechanisms are largely independent of the finer de- 
tails of the protein structure [1^]. Since strongly coop- 
erative mechanisms are necessary to bring the protein 
in its native conformation within a very short time [3[, 
we have shown that assortative mixing contributes posi- 
tively towards speeding up the folding process at different 
contact-length scales. The generality of assortative mix- 
ing in PCNs assume greater importance in the light of 
the debate on whether protein folding kinetics is under 
evolutionary control [S^, [33, [3J] . Given the genetic basis 
and mode of formation of protein chains, the signature 
of assortativity as an indicator to the rate of folding is 
clear. 

We also delineate the difference in the property of clus- 
tering of the nodes in the native structure at short and 
long length scales. The PCNs have high degree of cluster- 
ing, which contributes to their 'small-world' nature help- 
ing in efficient and effective dissipation of energy needed 
in their function JJ., J^]. Our results show that, in con- 
trast, the corresponding LINs have significantly lower 
and distributed clustering coefficients (Fig. [1]), and they 
show a negative correlation with the rate of folding of the 
proteins (Fig. [5]) . This indicates that clustering of amino 
acids that participate in the long-range interactions, into 
'cliques' can slow down the folding process - possibly due 
to the backbone connectivity and steric factors. How- 
ever, the clustering coefficient of PCNs do not have any 
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significant correlation to the rate of folding, clearly indi- 
cating that the short-range interactions may be playing 
a constructive and active role in the determination of 
the rate of the folding process by reducing the negative 
contribution of the LINs. Our results thus show that 
the separation of the types of contacts in the PCNs and 
LINs clearly delineate the length scale of contacts that 
play crucial role in protein folding. It was recently shown 
that the CO of the Transition State Ensemble {TSE) is 
highly correlated to that of their native state structure, 
and they both correlate equally well with their rate of 
folding [35] . This has been attributed to the fact that the 
long-range contacts are mainly located in the structural 
core that are formed early in the folding process, and the 
formation of such contact networks leads to the inverse 
correlation with the folding rates. Our results with gen- 
eral parameters of the long-range interaction networks 
{Clin and vljn) corresponding to the native PCNs also 
reflect the crucial role that long-range interactions play 
in their rate of folding. 

After the synthesis in the cell, folding of the amino 
acid chain is important for attaining the structure re- 
quired to reach a functional state as soon as possible. 
This happens through inter-residue non-covalent interac- 
tions at many length and time scales. The folded struc- 
ture have to confer stability, regions for binding of ligands 
of specific shapes and sizes, transmit the information of 
binding/unbinding to other parts of the protein, scaf- 
fold for retaining the functional regions along with the 
shape suitable for the protein function. It is likely that 
many of these properties may require opposing features 
to operate at different time and space scales. For exam- 
ple, the 'small-world' nature (high clustering) in the na- 
tive protein structure is useful in inter-residue signalling 
required for its function on binding and allostery. On 
the other hand, the long-range interaction network have 
reduced clustering, which may facilitate communication 
among distant residues in the native structure to some 
extent, but such a feature can also increase the folding 
time as it requires distant residues in the chain to come 
closer during the folding process. Thus, the evolved na- 
tive structure of the proteins show differential levels of 
clustering at two length scales. The assortative mixing, 
on the other hand, helps in enhancing the folding process 
at both length scales. 

A large number of networks of diverse origin have been 
found 21] to be of disassortative nature, and questions 
regarding the origin of this property and whether this 



is an universal property of complex networks, has been 
adjudged as "one of the ten leading questions for net- 
work research" [3a| ■ Our discovery of assortativity in the 
amino acid networks in protein structures at short and 
long contact scales questions the invoked generality of 
the property in natural networks. The assortative nature 
of the social networks has been claimed to be originat- 
ing from their unusually high clustering coefficients and 
community structure [37|. In proteins, LINs have high 
assortativity without necessarily having high clustering 
coefficients. It would be interesting to study if the sec- 
ondary structures provide any role in shaping the "com- 
munity structure" in these molecular networks that help 
in conferring assortative mixing at both contact length 
scales |37l.l38l]. 

Disassortative mixing observed in certain biological 
networks (metabolic signaling pathways network, and 
gene regulatory network) is conjectured to be respon- 
sible for decreasing the likelihood of crosstalk between 
different functional modules of the cell, and increasing 
the overall robustness of a network by localizing effects 
of deleterious perturbations [39|] . In contrast to these two 
networks, PCNs are not disassortative. For the PCN, one 
may put forward the possibility of the backbone chain 
connectivity as a means of conferring greater robustness 
against perturbations. 

From computational studies, it has been observed [2l|, 
29] that assortative networks percolate easily, i.e., infor- 
mation gets easily transferred through the network as 
compared to that in disassortative networks. Protein 
folding is a cooperative phenomenon, and hence, com- 
munication amongst nodes is essential, so that appropri- 
ate noncovalent interactions can take place to form the 
stable native state structure [3] . Thus percolation of in- 
formation is very much essential and could lead to the 
observed cooperativity and fast folding of the proteins. 
Hence assortative mixing observed in proteins could be 
an essential prerequisite for facilitating folding of pro- 
teins. 
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